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1  Introduction 

RMIT  participated  in  the  2008  Enterprise  Track  document  search  task.  Our  experiments  investi¬ 
gated  the  use  of  local  outdegree,  and  whether  this  can  improve  the  ranking  quality  of  a  search  result 
list. 

Unlike  global  outdegree,  which  counts  the  number  of  out-links  of  a  page  that  point  to  any  other 
pages  in  a  collection,  local  outdegree  only  counts  the  out-links  that  point  to  pages  contained  in  a 
search  result  list.  Intuitively,  restricting  the  outdegree  to  the  result  set  of  a  query  transforms  this 
source  of  evidence  from  something  general  into  a  topically-focused  source  of  information,  and  may 
help  to  reduce  the  problem  of  topic  shift. 

For  our  experiments,  we  used  the  Zettair  search  engine1  to  index  and  search  the  CSIRO  col¬ 
lection  used  for  the  2008  Enterprise  Track.  This  collection  is  a  crawl  of  the  the  public-facing 
web  of  the  Australian  Commonwealth  Scientific  and  Industrial  Research  Organization  (CSIRO)  in 
2007  (Bailey  et  al.,  2007).  Document  weights  were  calculated  using  the  Okapi  BM25  similarity 
function  (Sparck  Jones  et  al.,  2000),  with  query  words  being  terms  from  the  query  fields  of  the 
track  topics.  During  indexing  and  search,  words  are  stemmed  and  stopped.2 

2  Description  Of  Runs 

We  submitted  four  runs  to  the  2008  Enterprise  Track:  a  baseline,  two  valiants  using  local  outdegree, 
and  a  pseudo  relevance  feedback  approach: 

•  RmitDocQ:  Baseline  run. 

•  RmitDQComLO:  The  top  1000  retrieved  documents  from  RmitDocQ  arc  re -ranked  based 
on  a  lineal-  combination  of  document  weight  and  local  outdegree: 

weight  =  a  ■  similarity  +  (1  —  a)  outdegree 

•  RmitDocQRerank:  The  top  100  retrieved  documents  from  RmitDocQ  run  are  re -ranked 
using  local  outdegree. 

'Zettair  is  available  under  a  BSD  License  from:  http :  //www.  seg.rmit .  edu.au/zettair 

2The  stoplist  used  is  available  from:  http://www.csse.unimelb.edu.au/~jz/resources/stopping.zip 


•  RmitDQExp  For  each  query,  the  top  10  retrieved  documents  from  the  run  RmitDocQ  are 
treated  as  ’’relevant”  documents.  Terms  in  this  selected  set  of  documents  arc  weighted  ac¬ 
cording  to  the  following  term  selection  value: 

TSV  =  m(1)  x  -J- 

It 

The  weight  is  the  Robertson/Spark  Jones  weighting  function;  r  is  number  of  selected 
documents  which  contain  a  term,  and  R  is  the  number  of  documents  in  the  set.  The  top  10 
terms  arc  then  selected  and  combined  with  the  original  query  terms  to  form  a  new  query,  with 
the  original  query  terms  being  up-weighted  by  a  factor  of  3. 


3  Results 

As  shown  below,  on  average  our  runs  proved  unsuccessful.  From  an  initially  effective  baseline,  all 
techniques  reduced  the  mean  inferred  average  precision  and  mean  inferred  NDCG  of  the  run. 


Run 

Mean  infAP 

Mean  infNDCG 

TREC  median 

0.2670 

0.4679 

RmitDocQ 

0.2975 

0.5040 

RmitDQCombLO 

0.2837 

0.4970 

RmitDQRerank 

0.2644 

0.4810 

RmitDQExp 

0.2640 

0.4399 

However,  when  considering  individual  query  performance,  the  results  arc  varied.  Each  ap¬ 
proach  improved  some  topics,  while  hindering  the  performance  of  others.  The  RmitDQComLO 
run,  which  combined  local  outdegree  with  document  weight,  performed  best  with  a  positive  effect 
on  the  average  precision  of  26  topics  and  a  negative  effect  on  31  topics.  RmitDQRerank,  which 
placed  more  emphasis  on  local  outdegree,  had  a  less  noticeable  effect,  with  an  improvement  in  av¬ 
erage  precision  for  only  12  topics,  and  a  negative  outcome  for  20  topics.  Surprisingly,  our  query 
expansion  run  RmitDQExp  resulted  in  the  worst  performance,  with  43  topics  decreasing  in  average 
precision,  and  only  17  increasing. 

The  results  suggest  that,  like  many  other  proposed  query  evaluation  improvement  techniques, 
there  arc  potential  gains  to  be  achieved.  However,  understanding  when  to  apply  local  outdegree 
factors  to  query  evaluation,  and  how  to  best  utilize  the  information,  remain  an  open  question. 
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